Video decoder with inline downscaler

ABSTRACT

A device includes a memory configured to store video data. The device also includes a video decoder coupled to the memory and to a cache. The video decoder is configured to decode an input frame of the video data to generate a first video frame and includes an inline downscaler configured to generate a second video frame corresponding to the first video frame downscaled for display output.

I. FIELD

The present disclosure is generally related to decoding video data.

II. DESCRIPTION OF RELATED ART

Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, e-book readers, digital cameras, digital recording devices, digital media players, video gaming devices, video game consoles, cellular or satellite radio telephones, so-called “smart phones,” video teleconferencing devices, video streaming devices, and the like. Digital video devices implement video coding techniques, such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4, Part 10, Advanced Video Coding (AVC), ITU-T H.265/High Efficiency Video Coding (HEVC), ITU-T H.266/Versatile Video Coding (VVC) and extensions of such standards. Such video devices may transmit, receive, encode, decode, and/or store digital video information more efficiently by implementing such video coding techniques.

Conventionally, a video device can implement a video playback dataflow that includes a video decoder receiving an input data frame via a bitstream and also receiving reference frame pixel data from a dynamic random access memory (DRAM) to generate a reconstructed frame (a “decoded frame”) of the video, which is stored back into the DRAM. If the decoded frame has a higher resolution than is supported by a display device for video playback, a downscale pass is performed in which the decoded frame is transferred from the DRAM to a graphics processing unit (GPU) to generate a downscaled frame that matches a resolution supported by the display device.

For example, when a user selects to play an 8K video at a smart phone, the decoded video frames may have a resolution of 8192×4320 pixels, while the display may only be capable of supporting a resolution of 3120×1440 pixels. In this case, each decoded frame is stored into the DRAM, and a GPU performs a downscale pass to generate a downscaled frame, which is stored to the DRAM. Downscaled frames are read from the DRAM to a display processing unit (DPU) and provided to the display device via a display refresh.

Performance of the downscale pass, which can include waking the GPU for each frame of the video data, transferring decoded full-resolution frames from the DRAM to the GPU, downscaling at the GPU, and transferring the downscaled frames to the DRAM, consumes additional power, uses DRAM and GPU resources, and increases data traffic.

III. SUMMARY

According to a particular implementation of the techniques disclosed herein, a device includes a memory configured to store video data. The device also includes a video decoder coupled to the memory and to a cache. The video decoder is configured to decode an input frame of the video data to generate a first video frame and includes an inline downscaler configured to generate a second video frame corresponding to the first video frame downscaled for display output.

According to a particular implementation of the techniques disclosed herein, a method of processing video data includes obtaining, at a video decoder, an input frame of video data. The method includes decoding, at the video decoder, the input frame to generate a first video frame. The method also includes generating, at an inline downscaler of the video decoder, a second video frame corresponding to the first video frame downscaled for display output.

According to a particular implementation of the techniques disclosed herein, a non-transitory computer-readable medium includes instructions that, when executed by one or more processors, cause the one or more processors to decode, at the video decoder, an input frame of video data to generate a first video frame. The instructions, when executed by one or more processors, also cause the one or more processors to generate, at an inline downscaler of the video decoder, a second video frame corresponding to the first video frame downscaled for display output.

According to a particular implementation of the techniques disclosed herein, an apparatus includes means for obtaining an input frame of video data. The apparatus also includes means for decoding the input frame to generate a first video frame, the means for decoding including means for inline downscaling to generate a second video frame corresponding to the first video frame downscaled for display output.

Other implementations, advantages, and features of the present disclosure will become apparent after review of the entire application, including the following sections: Brief Description of the Drawings, Detailed Description, and the Claims.

IV. BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram illustrating an example of an implementation of a system operable to perform video decoding using a video decoder with an inline downscaler, in accordance with some examples of the present disclosure.

FIG. 2 is a block diagram illustrating an example of components of a video decoder that can be implemented in the system of FIG. 1 , in accordance with some examples of the present disclosure.

FIG. 3A is a diagram illustrating an example of a video decoding operation that can be implemented in the system of FIG. 1 , in accordance with some examples of the present disclosure.

FIG. 3B is a diagram illustrating an example of a video decoding operation that can be implemented in the system of FIG. 1 , in accordance with some examples of the present disclosure.

FIG. 4 is a block diagram illustrating an example of a video decoding operation that can be implemented in the system of FIG. 1 , in accordance with some examples of the present disclosure.

FIG. 5 is a block diagram illustrating an implementation of an integrated circuit operable to perform video decoding using a video decoder with an inline downscaler, in accordance with some examples of the present disclosure.

FIG. 6 is a diagram of an implementation of a portable electronic device operable to perform video decoding using a video decoder with an inline downscaler, in accordance with some examples of the present disclosure.

FIG. 7 is a diagram of a camera operable to perform video decoding using a video decoder with an inline downscaler, in accordance with some examples of the present disclosure.

FIG. 8 is a diagram of a wearable electronic device operable to perform video decoding using a video decoder with an inline downscaler, in accordance with some examples of the present disclosure.

FIG. 9 is a diagram of an extended reality device, such as augmented reality glasses, operable to perform video decoding using a video decoder with an inline downscaler, in accordance with some examples of the present disclosure.

FIG. 10 is a diagram of a headset, such as a virtual reality, mixed reality, or augmented reality headset, operable to perform video decoding using a video decoder with an inline downscaler, in accordance with some examples of the present disclosure.

FIG. 11 is a diagram of a voice-controlled speaker system operable to perform video decoding using a video decoder with an inline downscaler, in accordance with some examples of the present disclosure.

FIG. 12 is a diagram of a first example of a vehicle operable to perform video decoding using a video decoder with an inline downscaler, in accordance with some examples of the present disclosure.

FIG. 13 is a diagram of a second example of a vehicle operable to video decoding using a video decoder with an inline downscaler, in accordance with some examples of the present disclosure.

FIG. 14 is a diagram of a particular implementation of a method of processing video data using a video decoder with an inline downscaler, in accordance with some examples of the present disclosure.

FIG. 15 is a block diagram of a particular illustrative example of a device that is operable to perform video decoding using a video decoder with an inline downscaler, in accordance with some examples of the present disclosure.

V. DETAILED DESCRIPTION

Systems and methods to perform video decoding using a video decoder with an inline downscaler are disclosed. In conventional video decoding techniques, when a decoded frame has a higher resolution than is supported by a display device for video playback, a downscale pass is performed in which the decoded frame is transferred from DRAM to a GPU to generate a downscaled frame that matches a resolution supported by the display device. Performance of the downscale pass can include waking the GPU for each frame of the video data, transferring decoded full-resolution frames from the DRAM to the GPU, downscaling at the GPU, and transferring the downscaled frame to the DRAM, which consumes power, uses DRAM and GPU resources, and increases data traffic.

The disclosed systems and methods include techniques to bypass GPU downscale processing of full-resolution video frames by using an inline downscaler at the video decoder to generate downscaled frames. Because the video decoder can generate full-resolution and downscaled versions of each video frame, the downscaled versions of the video frames output by the video decoder can be provided to a display unit for output without accessing the GPU. As a result, use of the inline downscaler in the video decoder provides the technical advantages of generating reduced resolution video frames for playout without incurring the power consumption, DRAM and GPU resource consumption, and data traffic that result from downscaling using the conventional GPU downscaling pass.

In some aspects, the downscaled frames are stored at a system cache/on-chip memory instead of at the DRAM and retrieved from the system cache/on-chip memory for playout. Using the system cache/on-chip memory for storage and retrieval of downscaled video frames provides the technical advantage of reducing DRAM usage and data traffic associated with transferring the frames into and out of the DRAM.

According to an aspect, additional benefits are obtained by selectively storing the full-resolution video frames generated by the video decoder into the DRAM based on whether the full-resolution video frames are references frame that will be later used to decode another video frame. For example, the video decoder may decode input frames from a bitstream that also includes indications of which input frames are reference frames. The video decoder may store a particular full-resolution video frame to the DRAM only if the bitstream indicates that the input frame is a reference frame; otherwise, the full-resolution video frame may be discarded. As a result, storage of full-resolution non-reference frames to the DRAM is skipped, providing the technical advantage of further reducing memory usage, data traffic, and power consumption associated with video decoding and playback.

Particular aspects of the present disclosure are described below with reference to the drawings. In the description, common features are designated by common reference numbers. As used herein, various terminology is used for the purpose of describing particular implementations only and is not intended to be limiting of implementations. For example, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. Further, some features described herein are singular in some implementations and plural in other implementations. To illustrate, FIG. 1 depicts a device 102 including one or more processors (“processor(s)” 116 of FIG. 1 ), which indicates that in some implementations the device 102 includes a single processor 116 and in other implementations the device 102 includes multiple processors 116. For ease of reference herein, such features are generally introduced as “one or more” features and are subsequently referred to in the singular or optional plural (as indicated by “(s)” in the name of the feature) unless aspects related to multiple of the features are being described.

It may be further understood that the terms “comprise,” “comprises,” and “comprising” may be used interchangeably with “include,” “includes,” or “including.” Additionally, it will be understood that the term “wherein” may be used interchangeably with “where.” As used herein, “exemplary” may indicate an example, an implementation, and/or an aspect, and should not be construed as limiting or as indicating a preference or a preferred implementation. As used herein, an ordinal term (e.g., “first,” “second,” “third,” etc.) used to modify an element, such as a structure, a component, an operation, etc., does not by itself indicate any priority or order of the element with respect to another element, but rather merely distinguishes the element from another element having a same name (but for use of the ordinal term). As used herein, the term “set” refers to one or more of a particular element, and the term “plurality” refers to multiple (e.g., two or more) of a particular element.

As used herein, “coupled” may include “communicatively coupled,” “electrically coupled,” or “physically coupled,” and may also (or alternatively) include any combinations thereof. Two devices (or components) may be coupled (e.g., communicatively coupled, electrically coupled, or physically coupled) directly or indirectly via one or more other devices, components, wires, buses, networks (e.g., a wired network, a wireless network, or a combination thereof), etc. Two devices (or components) that are electrically coupled may be included in the same device or in different devices and may be connected via electronics, one or more connectors, or inductive coupling, as illustrative, non-limiting examples. In some implementations, two devices (or components) that are communicatively coupled, such as in electrical communication, may send and receive signals (e.g., digital signals or analog signals) directly or indirectly, via one or more wires, buses, networks, etc. As used herein, “directly coupled” may include two devices that are coupled (e.g., communicatively coupled, electrically coupled, or physically coupled) without intervening components.

In the present disclosure, terms such as “obtaining,” “determining,” “calculating,” “estimating,” “shifting,” “adjusting,” etc. may be used to describe how one or more operations are performed. It should be noted that such terms are not to be construed as limiting and other techniques may be utilized to perform similar operations. Additionally, as referred to herein, “obtaining,” “generating,” “calculating,” “estimating,” “using,” “selecting,” “accessing,” and “determining” may be used interchangeably. For example, “obtaining,” “generating,” “calculating,” “estimating,” or “determining” a parameter (or a signal) may refer to actively generating, estimating, calculating, or determining the parameter (or the signal) or may refer to using, selecting, retrieving, receiving, or accessing the parameter (or signal) that is already generated, such as by another component or device.

Referring to FIG. 1 , a particular illustrative aspect of a system 100 is depicted that includes a device 102 that is coupled to a display device 104 and that is configured to perform video decoding using a video decoder 124 with an inline downscaler 126. The device 102 includes a memory 110 coupled to one or more processors 116 and configured to store instructions 112 and video data. For example, the memory 110 may include encoded video data 122, one or more decoded frames 134 associated with the encoded video data 122, one or more downscaled frames or portions of downscaled frames, or any combination thereof, as described in further detail below. In a particular implementation, the memory 110 corresponds to a DRAM of a double data rate (DDR) memory subsystem.

The one or more processors 116 are configured to execute the instructions 112 to perform operations associated with decoding encoded video data 122 at the video decoder 124. In various implementations, some or all of the functionality associated with the video decoder 124 is performed via execution of the instructions 112 by the one or more processors 116, performed by processing circuitry of the one or more processors 116 in a hardware implementation, or a combination thereof.

The one or more processors 116 include the video decoder 124 coupled to an encoded data source 120. The video decoder 124 is also coupled to a system cache/on-chip memory 150, which is also referred to herein as a cache 150. The video decoder 124 is configured to obtain the encoded video data 122 from the encoded data source 120. For example, the encoded data source 120 may correspond to a portion of one or more of media files (e.g., a media file including the encoded video data 122 that is retrieved from the memory 110), a game engine, one or more other sources of video information, such as a remote media server, or a combination thereof.

In a particular implementation, the cache 150 and the video decoder 124 are integrated into a single substrate 190 (e.g., a single chip). Although the cache 150 is illustrated as distinct from and coupled to the video decoder 124, in other examples the cache 150 is integrated in the video decoder 124. According to an aspect, the cache 150 includes a static random access memory (SRAM).

The video decoder 124 is configured to decode an input frame of the video data 122 to generate a first video frame, illustrated as a decoded frame 130, and includes the inline downscaler 126 configured to generate a second video frame corresponding to the first video frame downscaled for display output, illustrated as a downscaled frame 132. For example, the video decoder 124 may be configured to receive a first input frame 162 in the encoded video data 122 and to process the first input frame 162 to generate the decoded frame 130 that has a first resolution (e.g., 8192×4320 pixels) that is determined by the encoded video data 122 and that is used in conjunction with decoding other frames of the encoded video data 122. To illustrate, the encoded video data 122 can include motion vectors that are associated with one or more previously decoded reference frames that are based on the first resolution and used for pixel prediction. However, the first resolution may be too large for the display device 104. For example, the display device 104 may support resolutions up to 3120×1440 pixels and may therefore be unable to display the decoded frame 130.

The inline downscaler 126 is configured to generate the downscaled frame 132 that corresponds to a downscaled version of the decoded frame 130 and that has a resolution that is supported by the display device 104. The decoded frame 130 and the downscaled frame 132 may be output by the video decoder 124 in parallel. For example, the downscaled frame 132 may be output to the cache 150 for storage and later retrieval by a display unit (e.g., a DPU) 140, and in parallel, the decoded frame 130 may be output to the memory 110 for storage, such as included with the decoded frame(s) 134. The decoded frame(s) 134 may include one or more reference frames, one or more non-reference frames, or a combination thereof. Additional details regarding operation of the video decoder 124 and the inline downscaler 126 are provided with reference to FIG. 2 .

Optionally, the video decoder 124 may be configured to selectively store the decoded frame 130 into the memory 110 based on whether the decoded frame 130 corresponds to a reference frame. To illustrate, the encoded video data 122 may include information that indicates which frames are reference frames, and the video decoder 124 may use the information to determine whether to store the decoded frame 130 or discard the decoded frame 130. For example, the decoded frame 130 may be stored into the memory 110 based on a determination that the decoded frame 130 is a reference frame, and storage of full-resolution non-reference frames to the memory 110 may be skipped. Skipping storage of full-resolution non-reference frames enables memory access bandwidth, power consumption, and storage capacity of the memory 110 that is used during decoding, to be reduced.

The display unit 140 is configured to receive the downscaled frame 132 and to generate a video data output 142, such as a display refresh, to the display device 104. To illustrate, the display unit 140 is configured to receive the downscaled frame 132 from the cache 150. The display unit 140 may also retrieve additional data from the memory 110 for use in conjunction with processing the downscaled frame 132 (e.g., layer composition) to generate the video data output 142.

The display device 104 is configured to display the video data output 142, which is based on the downscaled frame 132. For example, the video data output 142 can include a reconstructed frame that is based on the downscaled frame 132 for viewing by a user of the device 102.

The device 102 optionally includes a modem 118 that is coupled to the one or more processors 116 and configured to enable communication with one or more other devices, such as via one or more wireless networks. According to some aspects, the modem 118 is configured to receive the encoded video data 122 from a second device, such as video data that is streamed via a wireless transmission 194 from a remote device 192 (e.g., a remote server) for playback at the device 102.

During operation, the encoded video data 122 may be received at the video decoder 124 as a bitstream that includes a sequence of frames including a first input frame 162, a second input frame 164, and one or more additional input frames including an Nth input frame 166 (N is a positive integer). The encoded video data 122 is processed by the video decoder 124 to generate full-resolution decoded frames and downscaled frames for each of the input frames 162-166. For example, the video decoder 124 processes the first input frame 162 to generate the full-resolution decoded frame 130 and also to generate the downscaled frame 132 via operation of the inline downscaler 126. The video decoder 124 stores the downscaled frame 132 into the cache 150, and the downscaled frame 132 is retrieved from the cache 150 and provided to the display unit 140 for output at the display device 104.

The full-resolution decoded frames generated by the video decoder 124, such as the decoded frame 130, may be stored into the memory 110. In some implementations, however, only the full-resolution decoded frames that are reference frames are stored to the memory 110, and the rest of the full-resolution decoded frames that are not reference frames are discarded (e.g., erased or overwritten) at the video decoder 124.

The downscaled frames generated by the inline downscaler 126 in the video decoder 124 are stored into the cache 150, and later retrieved from the cache 150 by the display unit 140 for generation of the video data output 142. For example, the downscaled frame 132 may be generated at the video decoder 124 and transferred to the display unit 140 via the cache 150 without being stored at the memory 110 or processed at the GPU 160. However, in some cases, such as based on the size of the cache 150 and a management policy used by the device 102, the downscaled frame 132 may be evicted from the cache 150 and stored into the memory 110. In such cases, in response to a request for the downscaled frame 132 resulting in a cache miss, the downscaled frame 132 may be retrieved from the memory 110 and provided to the display unit 140. The display unit 140 generates the video data output 142 based on the downscaled frame 132 and provides the video data output 142 to the display device 104 for playout, such as to a user of the device 102.

Generation of the downscaled frame 132 at the inline downscaler 126 in the video decoder 124 enables reduced usage of the storage capacity of the memory 110, reduced memory bandwidth associated with data transfer into and out of the memory 110, and reduced power consumption associated with the memory 110 and the GPU 160 as compared to a conventional technique in which the downscaled frame 132 is generated by the GPU 160 accessing the decoded frame 130 from the memory 110 and processing the decoded frame 130 to generate the downscaled frame 132, which is then stored back to the memory 110. Use of the inline downscaler 126 thus provides improved power efficiency (e.g., reduced power consumption by reducing/bypassing operations at the memory 110 and the GPU 160), improved memory bandwidth efficiency (e.g., reduced data transfer into and out of the memory 110), and improved end-to-end efficiency from the video decoder 124 to the display device 104.

In addition, storage of the downscaled frame 132 at the cache 150 enables the downscaled frame 132 to be conveyed from the video decoder 124 to the display unit 140 without being stored into, and later retrieved from, the memory 110. As a result, usage of the storage capacity of the memory 110, memory bandwidth associated with data transfer into and out of the memory 110, and power consumption associated with the memory 110 are reduced as compared to storing the downscaled frame 132 into the memory 110 and later reading the downscaled frame 132 from the memory 110.

Table 1 provides illustrative, non-limiting examples of memory bandwidth savings and power savings that may be obtained via use of the disclosed techniques as compared to conventional decoding techniques in which the decoded frame 130 is stored to the memory 110, processed by the GPU 160 in a downscaling pass to generate a downscaled frame that is stored into the memory 110, and the downscaled frame is read from the memory 110 to the display unit 140.

TABLE 1 Display Bandwidth Resolution Video Resolution Savings Power Savings 3120 × 1440 8192 × 4320-10 bit 1165 MBps 124 mW 7680 × 4320-8 bit  819 MBps  92 mW

As shown in Table 1, for a 8192×4320 video resolution that is downscaled by the inline downscaler 126 to a 3120×1440 display resolution, the present techniques can result in a 1165 megabytes-per-second (MBps) reduction in memory bandwidth and a 124 milliwatt (mW) reduction in power consumption as compared to conventional techniques that use the GPU 160 in a downscaling pass.

Although specific examples of resolutions are described herein, such as 8192×4320 pixels as an example of the first resolution of the decoded frames and 3120×1440 pixels as an example of the largest resolution supported by the display device 104, these resolution examples are provided for purpose of illustration only and should not be construed as limiting. In general, the techniques described herein can be applied to decoded frames of any resolution to generate lower-resolution frames. In addition, the largest resolution supported by the display device 104 may also be any resolution. Although the present techniques can be used when the largest resolution supported by the display device 104 is less than the resolution of the decoded frames, the present techniques can also be used to generate lower-resolution video for playout even when the display device 104 is capable of playback using the original resolution of the decoded frames. For example, lower-resolution video playback may be selected (e.g., selected by user input via a user interface, or selected by a power/performance management policy the device 102) to reduce power consumption as compared to video playback at the resolution of the decoded frames.

Although FIG. 1 illustrates that the decoded frame 130 is stored into the memory 110 and the downscaled frame 132 is stored into the cache 150, in other implementations the downscaled frame 132 may instead be stored into the memory 110, such as described further with reference to FIG. 3A and FIG. 3B. To illustrate, the video decoder 124 may store both the decoded frame 130 and the downscaled frame 132 into the memory 110 and the display unit 140 may retrieve the downscaled frame 132 from the memory 110, such as described in further detail with reference to FIG. 3B.

For example, in some implementations, the video decoder 124 is configured to select between storing the downscaled frame 132 into the memory 110 or into the cache 150. To illustrate, the video decoder 124 may make the selection based on based on cache size and/or availability for storage of decoded pixel data, based on a user setting, based on a configuration setting that indicates a product tier, or any combination thereof. For example, a “value” tier may indicate that the device 102 has a relatively small cache 150, in which case the video decoder 124 may select to store downscaled video frames to the memory 110, while a “premium” tier may indicate that the device 102 has a relatively large cache 150, in which case the video decoder 124 may select to store downscaled video frames into the cache 150. The display unit 140 may also be configured to selectively receive the downscaled frame 132 from the memory 110 or from the cache 150 based on the storage location of the downscaled frame 132. To illustrate, the display unit 140 may select to retrieve the downscaled frame 132 from the memory 110 or from the cache 150 based on based on cache size and/or availability, based a user setting, based on a configuration setting (e.g., indicating a tier of the device 102), or any combination thereof.

In some implementations, the video decoder 124 is configured to store a first portion of the downscaled frame 132 into the memory 110 and to store a second portion of the downscaled frame 132 into the cache 150, such as described in further detail with reference to FIG. 4 . In such implementations, the display unit 140 may similarly be configured to receive the first portion of the downscaled frame 132 from the memory 110 and the second portion of the downscaled frame 132 from the cache 150 and combine the portions during generation of the video data output 142.

According to some aspects, the one or more processors 116 are integrated in at least one of a mobile phone or a tablet computer device, such as illustrated in FIG. 6 , or a wearable electronic device, such as illustrated in FIG. 8 . According to some aspects, the one or more processors 116 are integrated in a camera device, such as illustrated in FIG. 7 , or a voice-controlled speaker system, such as illustrated in FIG. 11 . According to some aspects, the one or more processors 116 are integrated in an extended reality headset device that is configured to display an output based on the downscaled frame 132, such as illustrated in FIG. 9 and FIG. 10 . According to some aspects, the one or more processors 116 are integrated in a vehicle that also includes a display device configured to display an output based on the downscaled frame 132, such as illustrated in FIG. 12 and FIG. 13 .

Although in some implementations the input frames 162-166 of the encoded video data 122 may all have the same resolution, in other implementations the present techniques can also be performed in an adaptive resolution decoding environment. Although the display device 104 is illustrated as included in (e.g., integrated with) the device 102, in other implementations the display device 104 may be coupled to, but not included in, the device 102. Although the modem 118 is illustrated as included in the device 102, in other examples the modem 118 may be omitted.

FIG. 2 depicts an illustrative example 200 including components that may be implemented in the video decoder 124. In the example 200, the video decoder 124 includes a bitstream parsing unit 210, a pixel prediction processing unit 212, an inverse transform processing unit 214, a pixel reconstruction and inloop filtering unit 216, a reference picture buffer 218, the inline downscaler 126, and a display picture buffer 220. In a particular implementation, the bitstream parsing unit 210, the pixel prediction processing unit 212, the inverse transform processing unit 214, the pixel reconstruction and inloop filtering unit 216, the reference picture buffer 218, the inline downscaler 126, the display picture buffer 220, or any combination thereof, may be implemented in one or more processors or in processing circuitry.

The various units shown in FIG. 2 are illustrated to assist with understanding the operations performed by the video decoder 124 in accordance with some implementations. The units may be implemented as fixed-function circuits, programmable circuits, or a combination thereof. Fixed-function circuits refer to circuits that provide particular functionality, and are preset on the operations that can be performed. Programmable circuits refer to circuits that can be programmed to perform various tasks, and provide flexible functionality in the operations that can be performed. For instance, programmable circuits may execute software or firmware that cause the programmable circuits to operate in the manner defined by instructions of the software or firmware. Fixed-function circuits may execute software instructions (e.g., to receive parameters or output parameters), but the types of operations that the fixed-function circuits perform are generally immutable. In some examples, one or more of the units may be distinct circuit blocks (fixed-function or programmable), and in some examples, the one or more units may be integrated circuits.

In general, the video decoder 124 reconstructs a picture on a block-by-block basis. The video decoder 124 may perform a reconstruction operation on each block individually (where the block currently being reconstructed, i.e., decoded, may be referred to as a “current block”).

The bitstream parsing unit 210 receives encoded video data and may entropy decode the encoded video data 122 to reproduce syntax elements. The pixel prediction processing unit 212, inverse transform processing unit 214, and the pixel reconstruction and inloop filtering unit 216 may generate decoded video data based on the syntax elements extracted from the bitstream. In some implementations, the bitstream parsing unit 210 may decode information indicating which frames in the bitstream 222 are reference frames, which the video decoder 124 may use to determine which full-resolution decoded frames to store into the memory 110.

The bitstream parsing unit 210 may entropy decode syntax elements defining quantized transform coefficients of a quantized transform coefficient block, as well as transform information, such as a quantization parameter (QP) and/or transform mode indication(s). The QP associated with the quantized transform coefficient block may be used to determine a degree of quantization and a degree of inverse quantization to apply. In an example, a bitwise left-shift operation may be performed to inverse quantize the quantized transform coefficients, and a transform coefficient block including transform coefficients may be formed.

The pixel prediction processing unit 212 may include one or more units to perform prediction in accordance with one or more prediction modes. As examples, the pixel prediction processing unit 212 may include a motion compensation unit, an inter-prediction unit, an intra-prediction unit, a palette unit, an affine unit, a linear model (LM) unit, one or more other units configured to prediction, or a combination thereof.

In addition, the pixel prediction processing unit 212 generates a prediction block according to prediction information syntax elements that were entropy decoded by the bitstream parsing unit 210. For example, if the prediction information syntax elements indicate that the current block is inter-predicted, a motion compensation unit (not shown) may generate the prediction block. In this case, the prediction information syntax elements may indicate a reference picture in the reference picture buffer 218 from which to retrieve a reference block, as well as a motion vector identifying a location of the reference block in the reference picture relative to the location of the current block in the current picture.

As another example, if the prediction information syntax elements indicate that the current block is intra-predicted, an intra-prediction unit (not shown) may generate the prediction block according to an intra-prediction mode indicated by the prediction information syntax elements. The pixel prediction processing unit 212 may retrieve data of neighboring samples to the current block from the reference picture buffer 218.

The pixel prediction processing unit 212 may also determine to decode blocks of video data using an intra block copy (IBC) mode. In general, in IBC mode, the video decoder 124 may determine predictive blocks for a current block, where the predictive blocks are in the same frame as the current block. The predictive blocks may be identified by a block vector (e.g., a motion vector) and limited to the locations of blocks that have already been decoded.

The inverse transform processing unit 214 may apply one or more inverse transforms to the transform coefficient block from the bitstream parsing unit 210 to generate a residual block associated with the current block. For example, the inverse transform processing unit 214 may apply an inverse discrete cosine transform (DCT), an inverse integer transform, an inverse Karhunen-Loeve transform (KLT), an inverse rotational transform, an inverse directional transform, or another inverse transform to the coefficient block.

The pixel reconstruction and inloop filtering unit 216 may reconstruct the current block using the prediction block and the residual block. For example, the pixel reconstruction and inloop filtering unit 216 may add samples of the residual block to corresponding samples of the prediction block to reconstruct the current block.

The pixel reconstruction and inloop filtering unit 216 may perform one or more filter operations on reconstructed blocks. For example, the pixel reconstruction and inloop filtering unit 216 may access reconstructed blocks and perform deblocking operations to reduce blockiness artifacts along edges of the reconstructed blocks. Operations of the pixel reconstruction and inloop filtering unit 216 are not necessarily performed in all examples.

The video decoder 124 may store the reconstructed blocks in the reference picture buffer 218, which may be implemented as, coupled to, or include the cache 150, the off-chip memory 110, or both, for larger storage capacity. The reference picture buffer 218 generally stores decoded pictures, illustrated as a reference frame 230 (e.g., the decoded frame 130), which the video decoder 124 may output, use as reference video data when decoding subsequent data or pictures of the encoded video bitstream, or both. As discussed above, the reference picture buffer 218 may provide reference information, such as samples of a current picture for intra-prediction and previously decoded pictures for subsequent motion compensation, to the pixel prediction processing unit 212. Such reference information may be provided from the memory 110 to the pixel prediction processing unit 212. In implementations in which reference information is stored at the cache 150, the reference information may be provided from the cache 150 to the pixel prediction processing unit 212 via a path indicated by the dotted line.

The inline downscaler 126 is configured to process pixels stored in the reference picture buffer 218 to generate downscaled versions that are saved as downscaled frames (e.g., the downscaled frame 132) in the display picture buffer 220. In a particular implementation, the inline downscaler 126 includes a combination of a four-tap filter based downscaler, and a bilinear 2:1 downscaler for larger downscaling ratios. Alternatively, or in addition, in other implementations one or more other types of downscalers may be used in the inline downscaler 126, and may be selected based on one or more power/area and quality criteria.

The video decoder 124 may output decoded pictures from the display picture buffer 220 for subsequent presentation on a display device. For example, the downscaled frame 132 is output from the display picture buffer 220 to the cache 150, and is retrieved from the cache 150 by the display unit 140 to generate the video data output 142 provided to the display device 104 of FIG. 1 . In some implementations, the downscaled frame 132 may be saved to the memory 110 through the cache 150, depending on a particular usage and/or configuration of the cache 150, and may be retrieved in response to a cache miss by the display unit 140.

FIG. 3A is a block diagram illustrating an example 300 of a video decoding operation that can be implemented in the system of FIG. 1 . As illustrated, the video decoder 124 is coupled to the memory 110 and to the cache 150, and the video decoder 124 receives the bitstream 222 (e.g., the encoded video data 122) and reference information, illustrated as the reference frame 230, from the memory 110. The video decoder 124 processes the bitstream 222 and the reference frame 230 to generate the decoded frame 130 and the downscaled frame 132. The video decoder 124 stores the decoded frame 130 into the memory 110 and stores the downscaled frame 132 into the cache 150. The display unit 140 retrieves the downscaled frame 132 from the cache 150 and processes downscaled frame 132 to generate the video data output 142, which is provided to the display device 104.

Optionally, the downscaled frame 132 may be removed from the cache 150 and stored in the memory 110, such as based on the usage and/or configuration of the cache 150, and the downscaled frame 132 may be retrieved from the memory 110 responsive to a request from the display unit 140. Such transfers of downscaled pixels (e.g., the downscaled frame 132) between the cache 150 and the memory 110 are illustrated as dotted lines.

Optionally, the video decoder 124 may select whether to store the downscaled frame 132 into the cache 150 or into the memory 110. For example, as described above with reference to FIG. 1 , the video decoder 124 may make the selection based on based on a size and/or availability of the cache 150, based a user setting, based on a configuration setting (e.g., indicating a tier of the device 102), or any combination thereof.

Optionally, the video decoder 124 may select whether to store the decoded frame 130 into the memory 110 based on whether the decoded frame 130 is a reference frame. For example, as described above, the bitstream parsing unit 210 may extract information from the bitstream 222 indicating whether the decoded frame 130 is a reference frame. After the decoded frame 130 has been processed by the inline downscaler 126 to generate the downscaled frame 132, if the decoded frame 130 is not a reference frame, the video decoder 124 may overwrite or erase the decoded frame 130 from the reference picture buffer 218 without outputting the decoded frame 130 to the cache 150 or to the memory 110.

FIG. 3B is a block diagram illustrating an example 350 of a video decoding operation that can be implemented in the system of FIG. 1 . In the example 350, the cache 150 is omitted from the decoding dataflow. The video decoder 124 stores the downscaled frame 132 to the memory 110, and the display unit 140 retrieves the downscaled frame 132 from the memory 110.

In some implementations, the video decoder 124 is configured to always bypass the cache 150, such as in accordance with a configuration parameter indicating that the video decoder 124 is implemented at a value tier chip that has a smaller cache 150 as compared to a premium tier chip. In other implementations, a determination of whether to bypass the cache 150 can be determined occasionally or periodically, such as on a frame-by-frame basis, and may be based on a usage of the cache 150 by other processes, an amount of available storage capacity in the cache, a relative priority of the video playback operation as compared to other ongoing processes that may use the cache 150, one or more other factors, or a combination thereof.

Optionally, the video decoder 124 may select whether to store the decoded frame 130 into the memory 110 or to discard the decoded frame 130 based on whether the decoded frame 130 is a reference frame, such as described above with reference to FIG. 3A.

FIG. 4 is a block diagram illustrating an example 400 of a video decoding operation that can be implemented in the system of FIG. 1 . In the example 400, a first portion 428A of the downscaled frame 132 is stored in the memory 110, and a second portion 428B of the downscaled frame 132 is stored in the cache 150. To illustrate, in some implementations, the video decoder 124 is configured to store the first portion 428A into the memory 110 and to store the second portion 428B into the cache 150. In some implementations, a memory system of the device 102 determines whether to store the first portion 428A in the cache 150 or the memory 110 and whether to store the second portion 428B in the cache 150 or the memory 110. In a particular implementation, the memory system may transfer one or more portions of the downscaled frame 132 between the memory 110 and the cache 150 in accordance with a cache management policy, a priority policy, an available capacity of the cache 150, a size of a portion of the cache 150 allotted to storage of display pixels, one or more other factors, or any combination thereof.

FIG. 5 is a block diagram illustrating an implementation 500 of the device 102 as an integrated circuit 502 for performing video decoding using a video decoder with an inline downscaler. The integrated circuit 502 includes the one or more processors 116, which include the video decoder 124 with the inline downscaler 126. Optionally, the integrated circuit 502 also includes the memory 110, the encoded data source 120, the cache 150, the display unit 140, the GPU 160, or any combination thereof. The integrated circuit 502 also includes a signal input 504, such as a bus interface, to enable the encoded video data 122 to be received. The integrated circuit 502 includes a signal output 506, such as a bus interface, to enable outputting a video data output 526, such as the video data output 142 or a sequence of downscaled frames including the downscaled frame 132. The integrated circuit 502 enables implementation of video decoding using the inline downscaler 126 as a component in a system that performs video decoding playback, such as depicted in FIG. 1 .

FIG. 6 depicts an implementation 600 in which the device 102 includes a mobile device 602, such as a phone or tablet, as illustrative, non-limiting examples. The mobile device 602 includes a display screen 604. The video decoder 124 with the inline downscaler 126 is integrated in the mobile device 602, such as in the integrated circuit 502 that is illustrated using dashed lines to indicate internal components that are not generally visible to a user of the mobile device 602. In a particular example, the video decoder 124 operates to perform video decoding using the inline downscaler 126. For example, the mobile device 602 may receive encoded video data from a remote device (e.g., a phone or computer device of another participant on a video conference), decode the encoded video data using the video decoder 124 including the inline downscaler 126, and display the resulting decoded video at the display screen 604.

FIG. 7 depicts an implementation 700 in which the device 102 includes a portable electronic device that corresponds to a camera device 702. The video decoder 124 with the inline downscaler 126 is integrated in the camera device 702, such as in the integrated circuit 502. During operation, the video decoder 124 performs video decoding using the inline downscaler 126 during playback of video data via a display of the camera device 702, such as video data that is captured by the camera device 702 and stored as encoded video data at a memory of the camera device 702.

FIG. 8 depicts an implementation 800 of a wearable electronic device 802, illustrated as a “smart watch.” In a particular aspect, the wearable electronic device 802 includes the device 102. The video decoder 124 with the inline downscaler 126 is integrated in the wearable electronic device 802, such as in the integrated circuit 502. In a particular aspect, the wearable electronic device 802 is coupled to or includes a display screen 804 to display video data decoded by the video decoder 124, and the video decoder 124 operates to perform video decoding using the inline downscaler 126. In a particular example, the wearable electronic device 802 includes a haptic device that provides a haptic notification (e.g., vibrates) associated with playback of decoded video data via the display screen 804. For example, the haptic notification can cause a user to look at the wearable electronic device 802 to watch video playback, such as a video announcement of an incoming video phone call or a video message received at the wearable electronic device 802.

FIG. 9 depicts an implementation 900 in which the device 102 includes a portable electronic device that corresponds to an extended reality device, such as augmented reality or mixed reality glasses 902. The glasses 902 include a holographic projection unit 904 configured to project visual data onto a surface of a lens 906 or to reflect the visual data off of a surface of the lens 906 and onto the wearer's retina. The video decoder 124 with the inline downscaler 126 is integrated in the glasses 902, such as in the integrated circuit 502. In a particular aspect, the video decoder 124 operates to perform video decoding using the inline downscaler 126 during playback of video data via a projection onto the surface of the lens 906 (e.g., the display device 104) to enable display of video associated with augmented reality, mixed reality, or virtual reality scenes to the user while the glasses 902 are worn.

FIG. 10 depicts an implementation 1000 of a portable electronic device that corresponds to a virtual reality, augmented reality, or mixed reality headset 1002. In a particular aspect, the headset 1002 includes the device 102 of FIG. 1 . The video decoder 124 with the inline downscaler 126 is integrated in the headset 1002, such as in the integrated circuit 502. In a particular aspect, the video decoder 124 operates to perform video decoding using the inline downscaler 126 during playback of video data via a visual interface device 1004 (e.g., the display device 104). The visual interface device 1004 is positioned in front of the user's eyes to enable display of video associated with augmented reality, mixed reality, or virtual reality scenes to the user while the headset 1002 is worn.

FIG. 11 is an implementation 1100 of a wireless speaker and voice activated device 1102. In a particular aspect, the wireless speaker and voice activated device 1102 includes the device 102 of FIG. 1 . The wireless speaker and voice activated device 1102 can have wireless network connectivity and is configured to execute an assistant operation. The one or more processors 116 are included in the wireless speaker and voice activated device 1102 and include the video decoder 124. In a particular aspect, the wireless speaker and voice activated device 1102 includes one or more microphones 1110 and one or more speakers 1104, and also includes or is coupled to a display device 1120 for playback of video that is output by the video decoder 124. During operation, the video decoder 124 performs video decoding using the inline downscaler 126 during playback of video data via the display device 1120. In response to receiving a verbal command via one or more microphones 1110, the wireless speaker and voice activated device 1102 can execute assistant operations, such as via execution of a voice activation system (e.g., an integrated assistant application). The assistant operations can include adjusting a temperature, playing media content such as stored or streaming audio and video content, turning on lights, etc. For example, the assistant operations are performed responsive to receiving a command after a keyword or key phrase (e.g., “hello assistant”).

FIG. 12 depicts an implementation 1200 in which the device 102 corresponds to or is integrated within a vehicle 1202, illustrated as a manned or unmanned aerial device (e.g., a package delivery drone). The video decoder 124 with the inline downscaler 126 is integrated in the vehicle 1202, such as in the integrated circuit 502. The vehicle 1202 also includes a display device 1204 configured to display an output based on downscaled video frames generated by the inline downscaler 126, such as the video data output 142.

The video decoder 124 operates to perform video decoding using the inline downscaler 126 during playback of video data that is decoded by the video decoder 124 and played back via a display device 1204. In some implementations, the vehicle 1202 is manned (e.g., carries a pilot, one or more passengers, or both), the display device 1204 is internal to a cabin of the vehicle 1202, and the video decoding using the inline downscaler 126 is performed during playback to a pilot or a passenger of the vehicle 1202. In another implementation, the vehicle 1202 is unmanned, the display device 1204 is mounted to an external surface of the vehicle 1202, and the video decoding using the inline downscaler 126 is performed during video playback to one or more viewers external to the vehicle 1202. For example, the vehicle 1202 may move (e.g., circle an outdoor audience during a concert) while playing out video such as advertisements or steaming video of the concert stage, and the one or more processors 116 (e.g., including the video decoder 124) may perform video decoding using the inline downscaler 126 to generate the video from an encoded video stream.

FIG. 13 depicts an implementation 1300 in which the device 102 corresponds to, or is integrated within, a vehicle 1302, illustrated as a car. The video decoder 124 with the inline downscaler 126 is integrated in the vehicle 1302, such as in the integrated circuit 502. The vehicle 1302 also includes a display device 1320 and one or more speakers 1310. In some implementations, the display device 1320 is configured to display video data output based on downscaled video frames generated by the inline downscaler 126, such as the video data output 142. For example, the video data may correspond to streaming video data from a remote source (e.g., a remote media server), video stored at the vehicle 1302, such as entertainment content or instructional videos regarding operation of the vehicle 1302, or video captured via one or more camera sensors of the vehicle 1302, such as a backup camera.

FIG. 14 illustrates an example of a method 1400 of decoding video data. One or more operations of the method 1400 may be performed by the system 100 of FIG. 1 (e.g., the device 102, the one or more processors 116, or the video decoder 124), as an illustrative, non-limiting example.

The method 1400 includes, at block 1402, obtaining, at a video decoder, an input frame of video data. For example, the first input frame 162 of FIG. 1 is received at the video decoder 124.

The method 1400 includes, at block 1404, decoding, at the video decoder, the input frame to generate a first video frame. For example, the video decoder 124 decodes the first input frame 162 to generate the decoded frame 130.

The method 1400 includes, at block 1406, generating, at an inline downscaler of the video decoder, a second video frame corresponding to the first video frame downscaled for display output. For example, the downscaled frame 132 is generated at the inline downscaler 126 of the video decoder 124.

In some implementations, the method 1400 includes outputting the first video frame and the second video frame in parallel. For example, the video decoder 124 may output the downscaled frame 132 in parallel with outputting the decoded frame 130.

In some implementations, the method 1400 includes storing the first video frame into a memory and storing the second video frame into a cache. For example, as illustrated in FIG. 1 , the decoded frame 130 may be stored into the memory 110 and the downscaled frame 132 may be stored into the cache 150. According to an aspect, the first video frame is stored into a memory based on a determination that the first video frame is a reference frame, and storage of full-resolution non-reference frames to the memory is skipped.

In some implementations, the method 1400 includes selecting between storing the second video frame into a memory or into a cache. For example, as explained with reference to FIG. 3A and FIG. 3B, the video decoder 124 may select between storing the downscaled frame 132 into the memory 110 or into the cache 150.

In some implementations, the method 1400 includes storing a first portion of the second video frame into a memory and storing a second portion of the second video frame into a cache. For example, as explained with reference to FIG. 4 , the first portion 428A of the downscaled frame 132 may be stored into memory 110 and the second portion 428B of the downscaled frame 132 may be stored into the cache 150.

In some implementations, the method 1400 includes receiving the second video frame at a display unit and generating an output to a display device, such as the display unit 140 receiving the downscaled frame 132 and generating the video data output 142 to the display device 104. For example, the second video frame may be received from a cache, such as the cache 150. As another example, the second video frame may be received from a memory, such as the memory 110. According to some aspects, a first portion of the second video frame is received from a memory and a second portion of the second video frame is received from a cache. For example, the display unit 140 may receive the first portion 428A of the downscaled frame 132 from the memory 110 and may receive the second portion 428B of the downscaled frame 132 from the cache 150.

The method 1400 of FIG. 14 may be implemented by a field-programmable gate array (FPGA) device, an application-specific integrated circuit (ASIC), a processing unit such as a central processing unit (CPU), a digital signal processor (DSP), a controller, another hardware device, firmware device, or any combination thereof. As an example, the method 1400 of FIG. 14 may be performed by a processor that executes instructions, such as described with reference to FIG. 15 .

Referring to FIG. 15 , a block diagram of a particular illustrative implementation of a device is depicted and generally designated 1500. In various implementations, the device 1500 may have more or fewer components than illustrated in FIG. 15 . In an illustrative implementation, the device 1500 may correspond to the device 102 of FIG. 1 . In an illustrative implementation, the device 1500 may perform one or more operations described with reference to FIGS. 1-14 .

In a particular implementation, the device 1500 includes a processor 1506 (e.g., a CPU). The device 1500 may include one or more additional processors 1510 (e.g., one or more DSPs). In a particular implementation, the one or more processors 116 of FIG. 1 correspond to the processor 1506, the processors 1510, or a combination thereof. For example, the processors 1510 may include the video decoder 124, the cache 150, the display unit 140, and a speech and music coder-decoder (CODEC) 1508. The speech and music CODEC 1508 may include a voice coder (“vocoder”) encoder 1536, a vocoder decoder 1538, or a combination thereof.

The device 1500 may include a memory 1586 and a CODEC 1534. The memory 1586 may include instructions 1556 that are executable by the one or more additional processors 1510 (or the processor 1506) to implement the functionality described with reference to the video decoder 124. In a particular example, the memory 1586 corresponds to the memory 110 and the instructions 1556 correspond to the instructions 112 of FIG. 1 . The device 1500 may include the modem 118 coupled, via a transceiver 1550, to an antenna 1552.

The device 1500 may include a display 1528, such as the display device 104, coupled to a display controller 1526. One or more speakers 1592, one or more microphones 1590, or a combination thereof, may be coupled to the CODEC 1534. The CODEC 1534 may include a digital-to-analog converter (DAC) 1502 and an analog-to-digital converter (ADC) 1504. In a particular implementation, the CODEC 1534 may receive analog signals from the microphones 1590, convert the analog signals to digital signals using the analog-to-digital converter 1504, and send the digital signals to the speech and music codec 1508. In a particular implementation, the speech and music codec 1508 may provide digital signals to the CODEC 1534. The CODEC 1534 may convert the digital signals to analog signals using the digital-to-analog converter 1502 and may provide the analog signals to the speakers 1592.

In a particular implementation, the device 1500 may be included in a system-in-package or system-on-chip device 1522. In a particular implementation, the memory 1586, the processor 1506, the processors 1510, the display controller 1526, the CODEC 1534, and the modem 118 are included in a system-in-package or system-on-chip device 1522. In a particular implementation, an input device 1530 (e.g., a keyboard, a touchscreen, or a pointing device) and a power supply 1544 are coupled to the system-in-package or system-on-chip device 1522. Moreover, in a particular implementation, as illustrated in FIG. 15 , the display 1528, the input device 1530, the speakers 1592, the microphones 1590, the antenna 1552, and the power supply 1544 are external to the system-in-package or system-on-chip device 1522. In a particular implementation, each of the display 1528, the input device 1530, the speakers 1592, the microphones 1590, the antenna 1552, and the power supply 1544 may be coupled to a component of the system-in-package or system-on-chip device 1522, such as an interface or a controller.

The device 1500 may include a smart speaker, a speaker bar, a mobile communication device, a smart phone, a cellular phone, a laptop computer, a computer, a tablet, a personal digital assistant, a display device, a television, a gaming console, a music player, a radio, a digital video player, a digital video disc (DVD) player, a tuner, a camera, a navigation device, a vehicle, a headset, an augmented reality headset, a mixed reality headset, a virtual reality headset, an aerial vehicle, a home automation system, a voice-activated device, a wireless speaker and voice activated device, a portable electronic device, a car, a vehicle, a computing device, a communication device, an internet-of-things (IoT) device, a virtual reality (VR) device, a base station, a mobile device, or any combination thereof.

In conjunction with the described techniques, an apparatus includes means for obtaining an input frame of video data. In an example, the means for obtaining an input frame of video data includes video decoder 124, the one or more processors 116, the device 102, the system 100, the bitstream parsing unit 210, one or more other circuits or devices to obtain an input frame of video data, or a combination thereof.

The apparatus includes means for decoding the input frame to generate a first video frame, the means for decoding including means for inline downscaling to generate a second video frame corresponding to the first video frame downscaled for display output. In an example, the means for decoding the input frame includes the video decoder 124, the one or more processors 116, the device 102, the system 100, the bitstream parsing unit 210, the pixel prediction processing unit 212, the inverse transform processing unit 214, the pixel reconstruction and inloop filtering unit 216, the reference picture buffer 218, one or more other circuits or devices to decode the video frame to generate a first video frame, or a combination thereof. In an example, the means for inline downscaling includes the inline downscaler 126, one or more other circuits or devices to generate a second video frame corresponding to the first video frame downscaled for display output, or a combination thereof.

In some implementations, a non-transitory computer-readable medium (e.g., a computer-readable storage device, such as the memory 110) includes instructions (e.g., the instructions 112) that, when executed by one or more processors that include a video decoder (e.g., the one or more processors 116 that include the video decoder 124), cause the one or more processors to perform operations corresponding to at least a portion of any of the techniques described with reference to FIGS. 1-13 , the method of FIG. 14 , or any combination thereof. For example, the instructions, when executed by the one or more processors, cause the one or more processors to decode, at the video decoder, an input frame of video data (e.g., the first input frame 162 of the encoded video data 122) to generate a first video frame (e.g., the decoded frame 130), and generate, at an inline downscaler of the video decoder (e.g., the inline downscaler 126), a second video frame (e.g., the downscaled frame 132) corresponding to the first video frame downscaled for display output.

Particular aspects of the disclosure are described below in the following sets of interrelated Examples:

According to Example 1, a device includes a memory configured to store video data; and a video decoder coupled to the memory and to a cache, the video decoder configured to decode an input frame of the video data to generate a first video frame and including an inline downscaler configured to generate a second video frame corresponding to the first video frame downscaled for display output.

Example 2 includes the device of Example 1, wherein the video decoder is configured to output the first video frame and the second video frame in parallel.

Example 3 includes the device of Example 1 or Example 2, wherein the video decoder is configured to select between storing the second video frame into the memory or into the cache.

Example 4 includes the device of any of Examples 1 to 3, wherein the first video frame is stored into the memory, and wherein the second video frame is stored into the cache.

Example 5 includes the device of any of Examples 1 to 4, wherein the first video frame is stored into the memory based on a determination that the first video frame is a reference frame, and wherein storage of full-resolution non-reference frames to the memory is skipped.

Example 6 includes the device of Example 1 or Example 2, wherein the video decoder is configured to store a first portion of the second video frame into the memory and to store a second portion of the second video frame into the cache.

Example 7 includes the device of any of Examples 1 to 6, and further includes a display unit configured to receive the second video frame and to generate an output to a display device.

Example 8 includes the device of Example 7, wherein the display unit is configured to receive the second video frame from the cache.

Example 9 includes the device of Example 7 or Example 8, wherein the display unit is configured to receive the second video frame from the memory.

Example 10 includes the device of any of Examples 7 to 9, wherein the display unit is configured to selectively receive the second video frame from the memory or from the cache.

Example 11 includes the device of Example 7, wherein the display unit is configured to receive a first portion of the second video frame from the memory and a second portion of the second video frame from the cache.

Example 12 includes the device of any of Examples 1 to 10, and further includes a display device configured to display an output based on the second video frame.

Example 13 includes the device of any of Examples 1 to 12, and further includes one or more processors that include the video decoder.

Example 14 includes the device of Example 13, and further includes a modem coupled to the one or more processors and configured to receive the video data.

Example 15 includes the device of Example 13 or Example 14, wherein the one or more processors are integrated in an extended reality headset device that is configured to display an output based on the second video frame.

Example 16 includes the device of any of Example 13 or Example 14, wherein the one or more processors are integrated in at least one of a mobile phone, a tablet computer device, a wearable electronic device.

Example 17 includes the device of Example 13 or Example 14, wherein the one or more processors are integrated in a mobile phone.

Example 18 includes the device of Example 13 or Example 14, wherein the one or more processors are integrated in a tablet computer device.

Example 19 includes the device of Example 13 or Example 14, wherein the one or more processors are integrated in a wearable electronic device.

Example 20 includes the device of Example 13 or Example 14, wherein the one or more processors are integrated in a vehicle, the vehicle further including a display device configured to display an output based on the second video frame.

Example 21 includes the device of any of Examples 13 to 20, wherein the one or more processors are included in an integrated circuit.

According to Example 22, a method of processing video data includes obtaining, at a video decoder, an input frame of video data; decoding, at the video decoder, the input frame to generate a first video frame; and generating, at an inline downscaler of the video decoder, a second video frame corresponding to the first video frame downscaled for display output.

Example 23 includes the method of Example 22, and further includes outputting the first video frame and the second video frame in parallel.

Example 24 includes the method of Example 22 or Example 23, and further includes selecting between storing the second video frame into a memory or into a cache.

Example 25 includes the method of any of Examples 22 to 24, further including storing the first video frame into a memory and storing the second video frame into a cache.

Example 26 includes the method of any of Examples 22 to 25, wherein the first video frame is stored into a memory based on a determination that the first video frame is a reference frame, and wherein storage of full-resolution non-reference frames to the memory is skipped.

Example 27 includes the method of Example 22 or Example 23, and further includes storing a first portion of the second video frame into a memory and storing a second portion of the second video frame into a cache.

Example 28 includes the method of any of Examples 22 to 27, and further includes receiving the second video frame at a display unit and generating an output to a display device.

Example 29 includes the method of Example 28, wherein the second video frame is received from a cache.

Example 30 includes the method of Example 28, wherein the second video frame is received from a memory.

Example 31 includes the method of Example 28, wherein a first portion of the second video frame is received from a memory and a second portion of the second video frame is received from a cache.

According to Example 32, a non-transitory computer-readable medium includes instructions that, when executed by one or more processors that include a video decoder, cause the one or more processors to: decode, at the video decoder, an input frame of video data to generate a first video frame; and generate, at an inline downscaler of the video decoder, a second video frame corresponding to the first video frame downscaled for display output.

Example 33 includes the non-transitory computer-readable medium of Example 32, wherein the instructions, when executed by the one or more processors, further cause the one or more processors to output the first video frame and the second video frame in parallel.

Example 34 includes the non-transitory computer-readable medium of Example 32 or Example 33, wherein the instructions, when executed by the one or more processors, further cause the one or more processors to select between storing the second video frame into a memory or into a cache.

Example 35 includes the non-transitory computer-readable medium of any of Examples 32 to 34, wherein the instructions, when executed by the one or more processors, further cause the one or more processors to store the first video frame into a memory and store the second video frame into a cache.

Example 36 includes the non-transitory computer-readable medium of any of Examples 32 to 35, wherein the first video frame is stored into a memory based on a determination that the first video frame is a reference frame, and wherein storage of full-resolution non-reference frames to the memory is skipped.

Example 37 includes the non-transitory computer-readable medium of Example 32 or Example 33, wherein the instructions, when executed by the one or more processors, further cause the one or more processors to store a first portion of the second video frame into a memory and storing a second portion of the second video frame into a cache.

Example 38 includes the non-transitory computer-readable medium of any of Examples 32 to 37, wherein the instructions, when executed by the one or more processors, further cause the one or more processors to receive the second video frame at a display unit and generating an output to a display device.

Example 39 includes the non-transitory computer-readable medium of Example 38, wherein the second video frame is received from a cache.

Example 40 includes the non-transitory computer-readable medium of Example 38, wherein the second video frame is received from a memory.

Example 41 includes the non-transitory computer-readable medium of Example 38, wherein a first portion of the second video frame is received from a memory and a second portion of the second video frame is received from a cache.

According to Example 42, an apparatus includes means for obtaining an input frame of video data; and means for decoding the input frame to generate a first video frame, the means for decoding including means for inline downscaling to generate a second video frame corresponding to the first video frame downscaled for display output.

Example 43 includes the apparatus of Example 42, and further includes means for outputting the first video frame and the second video frame in parallel.

Example 44 includes the apparatus of Example 42 or Example 43, and further includes means for selecting between storing the second video frame into a memory or into a cache.

Example 45 includes the apparatus of any of Examples 42 to 44, and further includes means for storing the first video frame into a memory and storing the second video frame into a cache.

Example 46 includes the apparatus of any of Examples 42 to 45, wherein the first video frame is stored into a memory based on a determination that the first video frame is a reference frame, and wherein storage of full-resolution non-reference frames to the memory is skipped.

Example 47 includes the apparatus of Example 42 or Example 43, and further includes means for storing a first portion of the second video frame into a memory and storing a second portion of the second video frame into a cache.

Example 48 includes the apparatus of any of Examples 42 to 47, and further includes means for receiving the second video frame at a display unit and generating an output to a display device.

Example 49 includes the apparatus of Example 48, wherein the second video frame is received from a cache.

Example 50 includes the apparatus of Example 48, wherein the second video frame is received from a memory.

Example 51 includes the apparatus of Example 48, wherein a first portion of the second video frame is received from a memory and a second portion of the second video frame is received from a cache.

Those of skill would further appreciate that the various illustrative logical blocks, configurations, circuits, and algorithm steps described in connection with the implementations disclosed herein may be implemented as electronic hardware, computer software executed by a processing device such as a hardware processor, or combinations of both. Various illustrative components, blocks, configurations, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or executable software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

The steps of a method or algorithm described in connection with the implementations disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in a memory device, such as random access memory (RAM), magnetoresistive random access memory (MRAM), spin-torque transfer MRAM (STT-MRAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, hard disk, a removable disk, a compact disc read-only memory (CD-ROM), or any other form of non-transient storage medium known in the art. An exemplary memory device is coupled to the processor such that the processor can read information from, and write information to, the memory device. In the alternative, the memory device may be integral to the processor. The processor and the storage medium may reside in an application-specific integrated circuit (ASIC). The ASIC may reside in a computing device or a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a computing device or a user terminal.

The previous description of the disclosed implementations is provided to enable a person skilled in the art to make or use the disclosed implementations. Various modifications to these implementations will be readily apparent to those skilled in the art, and the principles defined herein may be applied to other implementations without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the implementations shown herein but is to be accorded the widest scope possible consistent with the principles and novel features as defined by the following claims. 

What is claimed is:
 1. A device comprising: a memory configured to store video data; and a video decoder coupled to the memory and to a cache, the video decoder configured to decode an input frame of the video data to generate a first video frame and including an inline downscaler configured to generate a second video frame corresponding to the first video frame downscaled for display output.
 2. The device of claim 1, wherein the video decoder is configured to output the first video frame and the second video frame in parallel.
 3. The device of claim 1, wherein the video decoder is configured to select between storing the second video frame into the memory or into the cache.
 4. The device of claim 1, wherein the video decoder is configured to store a first portion of the second video frame into the memory and to store a second portion of the second video frame into the cache.
 5. The device of claim 1, wherein the first video frame is stored into the memory, and wherein the second video frame is stored into the cache.
 6. The device of claim 1, wherein the first video frame is stored into the memory based on a determination that the first video frame is a reference frame, and wherein storage of full-resolution non-reference frames to the memory is skipped.
 7. The device of claim 1, further comprising a display unit configured to receive the second video frame and to generate an output to a display device.
 8. The device of claim 7, wherein the display unit is configured to receive the second video frame from the cache.
 9. The device of claim 7, wherein the display unit is configured to receive the second video frame from the memory.
 10. The device of claim 7, wherein the display unit is configured to selectively receive the second video frame from the memory or from the cache.
 11. The device of claim 7, wherein the display unit is configured to receive a first portion of the second video frame from the memory and a second portion of the second video frame from the cache.
 12. The device of claim 1, further comprising a display device configured to display an output based on the second video frame.
 13. The device of claim 1, further comprising one or more processors that include the video decoder.
 14. The device of claim 13, further comprising a modem coupled to the one or more processors and configured to receive the video data.
 15. The device of claim 13, wherein the one or more processors are integrated in an extended reality headset device that is configured to display an output based on the second video frame.
 16. The device of claim 13, wherein the one or more processors are integrated in at least one of a mobile phone, a tablet computer device, or a wearable electronic device.
 17. The device of claim 13, wherein the one or more processors are integrated in a vehicle, the vehicle further including a display device configured to display an output based on the second video frame.
 18. The device of claim 13, wherein the one or more processors are included in an integrated circuit.
 19. A method of processing video data comprising: obtaining, at a video decoder, an input frame of video data; decoding, at the video decoder, the input frame to generate a first video frame; and generating, at an inline downscaler of the video decoder, a second video frame corresponding to the first video frame downscaled for display output.
 20. The method of claim 19, further comprising outputting the first video frame and the second video frame in parallel.
 21. The method of claim 19, further comprising selecting between storing the second video frame into a memory or into a cache.
 22. The method of claim 19, further comprising storing a first portion of the second video frame into a memory and storing a second portion of the second video frame into a cache.
 23. The method of claim 19, further comprising storing the first video frame into a memory and storing the second video frame into a cache.
 24. The method of claim 19, wherein the first video frame is stored into a memory based on a determination that the first video frame is a reference frame, and wherein storage of full-resolution non-reference frames to the memory is skipped.
 25. The method of claim 19, further comprising receiving the second video frame at a display unit and generating an output to a display device.
 26. The method of claim 25, wherein the second video frame is received from a cache.
 27. The method of claim 25, wherein the second video frame is received from a memory.
 28. The method of claim 25, wherein a first portion of the second video frame is received from a memory and a second portion of the second video frame is received from a cache.
 29. A non-transitory computer-readable medium comprising instructions that, when executed by one or more processors that include a video decoder, cause the one or more processors to: decode, at the video decoder, an input frame of video data to generate a first video frame; and generate, at an inline downscaler of the video decoder, a second video frame corresponding to the first video frame downscaled for display output.
 30. An apparatus comprising: means for obtaining an input frame of video data; and means for decoding the input frame to generate a first video frame, the means for decoding including means for inline downscaling to generate a second video frame corresponding to the first video frame downscaled for display output. 